A Bayesian Insertion/Deletion Algorithm for Distant Protein Motif Searching via Entropy Filtering
نویسندگان
چکیده
Bayesian models have been developed that nd ungapped motifs in multiple protein sequences. In this article, we extend the model to allow for deletions and insertions in motifs. Direct generalization of the ungapped algorithm, based on Gibbs sampling, proved unsuccessful because the con guration space became much larger. To alleviate the convergence dif culty, a two-stage procedure is introduced. At the rst stage, we develop a method called entropy ltering, which quickly searchs “good” starting points for the alignment approach without the concern of deletion/insertion patterns. At the second stage, we switch to an algorithm that generates both a random vector that represents insertion/deletion patterns and a random variable of motif locations. After the two steps, gapped-motif alignments are obtained for multiple sequences. When applied to datasets that consist of helix–loop–helix proteins and high mobility group proteins, respectively, our methods show great improvements over those that produce ungapped alignments.
منابع مشابه
Markovian Structures in Biological Sequence
SUMMARY The alignment of multiple homologous biopolymer sequences is crucial in research on protein modeling and engineering, molecular evolution, and prediction both as to gene function and gene product's structure. In this article, we provide a coherent view of the two recent models used for multiple sequence alignment | the hidden Markov model (HMM) and the block-based motif model | in order...
متن کاملDetermination of Maximum Bayesian Entropy Probability Distribution
In this paper, we consider the determination methods of maximum entropy multivariate distributions with given prior under the constraints, that the marginal distributions or the marginals and covariance matrix are prescribed. Next, some numerical solutions are considered for the cases of unavailable closed form of solutions. Finally, these methods are illustrated via some numerical examples.
متن کاملBayesian Phylogenetic Inference under a Statistical Insertion-Deletion Model
A central problem in computational biology is the inference of phylogeny given a set of DNA or protein sequences. Currently, this problem is tackled stepwise, with phylogenetic reconstruction dependent on an initial multiple sequence alignment step. However these two steps are fundamentally interdependent. Whether the main interest is in sequence alignment or phylogeny, a major goal of computat...
متن کاملIdentification of motifs with insertions and deletions in protein sequences using self-organizing neural networks
The problem of motif identification in protein sequences has been studied for many years in the literature. Current popular algorithms of motif identification in protein sequences face two difficulties, high computational cost and the possibility of insertions and deletions. In this paper, we provide a new strategy that solve the problem more efficiently. We develop a self-organizing neural net...
متن کاملThe study on the spam filtering technology based on Bayesian algorithm
This paper analyzed spam filtering technology, carried out a detailed study of Naive Bayes algorithm, and proposed the improved Naive Bayesian mail filtering technology. Improvement can be seen in text selection as well as feature extraction. The general Bayesian text classification algorithm mostly takes information gain and cross-entropy algorithm in feature selection. Through the principle o...
متن کامل